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Abstract 

Subspace clustering methods based on i\, £2 or nuclear 
norm regularization have become very popular due to their 
simplicity, theoretical guarantees and empirical success. 
However, the choice of the regularizer can greatly impact 
both theory and practice. For instance, l\ regularization is 
guaranteed to give a subspace-preserving affinity (i.e., there 
are no connections between points from different subspaces) 
under broad conditions (e.g., arbitrary subspaces and cor¬ 
rupted data). However, it requires solving a large scale 
convex optimization problem. On the other hand, £2 ond 
nuclear norm regularization provide efficient closed form 
solutions, but require very strong assumptions to guarantee 
a subspace-preserving affinity, e.g., independent subspaces 
and uncorrupted data. In this paper we study a subspace 
clustering method based on orthogonal matching pursuit. 
We show that the method is both computationally efficient 
and guaranteed to give a subspace-preserving affinity under 
broad conditions. Experiments on synthetic data verify our 
theoretical analysis, and applications in handwritten digit 
and face clustering show that our approach achieves the 
best trade off between accuracy and efficiency. 

1. Introduction 

In many computer vision applications, such as motion 
segmentation [10, 35, 28], hand written digit clustering [41] 
and face clustering [4, 21], data from different classes can 
be well approximated by a union of low dimensional sub¬ 
spaces. In these scenarios, the task is to partition the data 
according to the membership of data points to subspaces. 

More formally, given a set of points X = {xjG jLi 
lying in an unknown number n of subspaces of 

unknown dimensions subspace clustering is the 

problem of clustering the data into groups such that each 
group contains only data points from the same subspace. 
This problem has received great attention in the past decade 
and many subspace clustering algorithms have been devel¬ 
oped, including iterative, algebraic, statistical, and spectral 
clustering based methods (see [33] for a review). 

Sparse and Low Rank Methods. Among existing tech¬ 


niques, methods based on applying spectral clustering to an 
affinity matrix obtained by solving an optimization problem 
that incorporates £ 1 ,12 or nuclear norm regularization have 
become extremely popular due to their simplicity, theoreti¬ 
cal correctness, and empirical success. These methods are 
based on the so-called self-expressiveness property of data 
lying in a union of subspaces, originally proposed in [13]. 
This property states that each point in a union of subspaces 
can be written as a linear combination of other data points 
in the subspaces. That is, 

Xj = Xcj and Cjj = 0, or equivalently 
X = XC and diag(C) = 0, ^ ^ 

where X = [a^i, ... ,Xff\ £ is the data matrix and 

C = [ci,..., Ctv] G is the matrix of coefficients. 

While (1) may not have a unique solution for C, there 
exist solutions whose entries are such that if ^ 0, then 
Xi is in the same subspace as Xj. For example, a point Xj £ 
Si can always be written as a linear combination of di other 
points in Si. Such solutions are called subspace preserving 
since they preserve the clustering of the subspaces. Given a 
subspace preserving C, one can build an affinity matrix W 
between every pair of points Xi and Xj as Wij = | j | -f | |, 

and apply spectral clustering [36] to W to cluster the data. 

To hnd a subspace preserving C, existing methods regu¬ 
larize C with a norm || • ||, and solve a problem of the form: 

C* = arg min lie'll s.t. X = XC, diag(C') = 0. (2) 
C 

For instance, the sparse subspace clustering (SSC) algo¬ 
rithm [13] uses the £1 norm to encourage the sparsity of C. 
Prior work has shown that SSC gives a subspace-preserving 
solution if the subspaces are independent [13, 15], or if the 
data from different subspaces satisfy certain separation con¬ 
ditions and data from the same subspace are well spread out 
[14, 15, 40, 30]. Similar results exist for cases where data 
is corrupted by noise [37, 31] and outliers [30]. Other self¬ 
expressiveness based methods use different regularizations 
on the coefficient matrix C. Least squares regression (LSR) 
[25] uses £2 regularization on C. Low rank representation 
(LRR) [23, 22] and low rank subspace clustering (LRSC) 
[16, 34] use nuclear norm minimization to encourage C to 


be low-rank. Based on these, [24, 26, 19, 39] study regular¬ 
izations that are a mixture of and li, and [38,42] propose 
regularizations that are a blend of l\ and the nuclear norm. 

The advantage of regularized LSR and nuclear norm 
regularized LRR and LRSC over sparsity regularized SSC 
is that the solution for C can be computed in closed form 
from the SVD of the (noiseless) data matrix X, thus they 
are computationally more attractive. However, the resulting 
C is subspace preserving only when subspaces are indepen¬ 
dent and the data is uncorrupted. Thus, there is a need for 
methods that both guarantee a subspace-preserving affinity 
under broad conditions and are computationally efficient. 

Paper Contributions. In this work we study the self¬ 
expressiveness based subspace clustering method that uses 
orthogonal matching pursuit (OMP) to find a sparse repre¬ 
sentation in lieu of the £i-based basis pursuit (BP) method. 
The method is termed SSC-OMP, for its kinship to the orig¬ 
inal SSC, which is referred to as SSC-BP in this paper. 

The main contributions of this paper are to find theoreti¬ 
cal conditions under which the affinity produced by SSC- 
OMP is subspace preserving and to demonstrate its effi¬ 
ciency for large scale problems. Specifically, we show that: 

1. When the subspaces and the data are deterministic, 
SSC-OMP gives a subspace-preserving C if the sub¬ 
spaces are independent, or else if the subspaces are 
sufficiently separated and the data is well distributed. 

2. When the subspaces and data are drawn uniformly at 
random, SSC-OMP gives a subspace-preserving C if 
the dimensions of the subspaces are sufficiently small 
relative to the ambient dimension by a factor controlled 
by the sample density and the number of subspaces. 

3. SSC-OMP is orders of magnitude faster than the orig¬ 
inal SSC-BP, and can handle up to 100,000 data points. 

Related work. It is worth noting that the idea of using OMP 
for SSC had already been considered in [12]. The core con¬ 
tribution of our work is to provide much weaker yet more 
succinct and interpretable conditions for the affinity to be 
subspace preserving in the case of arbitrary subspaces. In 
particular, our conditions are naturally related to those for 
SSC-BP, which reveal insights about the relationship be¬ 
tween these two sparsity-based subspace clustering meth¬ 
ods. Moreover, our experimental results provide a much 
more detailed evaluation of the behavior of SSC-OMP for 
large-scale problems. It is also worth noting that conditions 
under which OMP gives a subspace-preserving representa¬ 
tion had also been studied in [40]. Our paper presents a 
much more comprehensive study of OMP for the subspace 
clustering problem, by providing results under deterministic 
independent, deterministic arbitrary and random subspace 
models. In particular, our result for deterministic arbitrary 
models is much stronger than the main result in [40]. 


2. SSC by Orthogonal Matching Pursuit 

The SSC algorithm approaches the subspace clustering 
problem by finding a sparse representation of each point in 
terms of other data points. Since each point in Si can be 
expressed in terms of at most di N other points in Si, 
such a sparse representation always exists. In principle, we 
can find it by solving the following optimization problem: 

c* = argmin ||cj||o s.t. Xj = Xcj,Cjj = 0, (3) 

where ||c||o counts the number of nonzero entries in c. 
Since this problem is NP hard, the SSC method in [13] re¬ 
laxes this problem and solves the following £i problem: 

c* = argmin ||cj||i s.t. Xj = Xcj, cjj = 0. (4) 

Since this problem is called the basis pursuit (BP) problem, 
we refer to the SSC algorithm in [13] as SSC-BP. 

The optimization problems (3) and (4) have been studied 
extensively in the compressed sensing community, see, e.g., 
the tutorials [5, 7], and it is well known that, under certain 
conditions on the dictionary X, their solutions are the same. 
However, results from compressed sensing do not apply to 
the subspace clustering problem because when the columns 
of X lie in a union of subspaces the solution for C need 
not be unique (see Section 3 for more details). This has 
motivated extensive research on the conditions under which 
the solutions of (3) or (4) are useful for subspace clustering. 

It is shown in [13, 14, 15] that when the subspaces are 
either independent or disjoint, and the data are noise free 
and well distributed, both (3) and (4) provide a sparse rep¬ 
resentation Cj that is subspace preserving, as defined next. 

Definition 1 (Subspace-preserving representation). A rep¬ 
resentation c G of a point x G Si in terms of the dic¬ 
tionary X = [xi ,..., a:jv] called subspace preserving if 
its nonzero entries correspond to points in Si, i.e. 

yj = l,...,N, Cj^O^XjGS,. (5) 

In practice, however, solving N ^i-minimization prob¬ 
lems over N variables may be prohibitive when N is large. 
As an alternative, consider the following program: 

c* = dxgTtva\\\xj — Xcj\\\ s.t. Ijc^jlo < k, Cjj = 0. (6) 

It is shown in [32, 11 ] that, under certain conditions, this 
problem can be solved using the orthogonal matching pur¬ 
suit (OMP) algorithm [27] (Algorithm 1). OMP solves the 
problem mine \\Ac — b\ \l s.t. ||c||o < k greedily by se¬ 
lecting one column of A = [oi,..., om] at a time (the one 
that maximizes the absolute value of the dot product with 
the residual in line 3) and computing the coefficients for the 
selected columns until k columns are selected. For subspace 
clustering purposes, the vector c* G (the jth column of 
C* G is computed as OMP(A'_j, tCj) G R^~^ 


Algorithm 1: Orthogonal Matching Pursuit (OMP) 

Input: A = [ai,..., om] G b G M™, fcmax. £■ 

1: Initialize k = 0, residual = b, support set Tq = 0. 

2: while k < fcmax and ||qj .||2 > e do 
3: Tfe +1 = Tk U{**}> where i* = argmax 

2=1,...,M 

4: Qk +1 — ~ )b, where Pt^+i is the projection 

onto the span of the vectors {aj,j G Tk+i}. 

5: k ^— k X. 

6 : end while 

Output: c* = argmin^^sapp(^)cT, ll^» “ MU- 


Algorithm 2 : Sparse Suhspace Clustering hy Orthogo¬ 
nal Matching Pursuit (SSC-OMP) 

Input: Data X = [a;i, • • • , a; at], parameters fcmax, £■ 

1: Compute c* from OMP(A_j, Xj) using Algorithm 1. 
2: SetC* = ,c;|V] and VC = |C*| + 

3: Compute segmentation from W by spectral clustering. 
Output: Segmentation of data X. 


with a zero inserted in its jih entry, where X-j is the data 
matrix with the jth column removed. After C* is computed, 
the segmentation of the data is found by applying spectral 
clustering to the affinity matrix W = |C* | + \C*^ \ as done 
in SSC-BP. The procedure is summarized in Algorithm 2. 

3. Theoretical Analysis of SSC-OMP 

OMP has been shown to be effective for sparse recovery, 
with the advantage over BP that it admits simple, fast imple¬ 
mentations. However, note that existing conditions for the 
correctness of OMP for sparse recovery are too strong for 
the subspace clustering problem. In particular, note that the 
matrix X need not satisfy the mutual incoherence [32] or 
restricted isometry properties [11], as two points in a sub¬ 
space could be arbitrarily close to each other. More im¬ 
portantly, these conditions are not applicable here because 
our goal is not to recover a unique sparse solution. In fact, 
the sparse solution is not unique since any di linearly in¬ 
dependent points from Si can represent a point Xj G Si. 
Therefore, there is a need to find conditions under which 
the output of OMP (which need not coincide with the solu¬ 
tion of (6) or (3)) is guaranteed to be subspace preserving. 

This section is devoted to studying sufficient conditions 
under which SSC-OMP gives a subspace-preserving repre¬ 
sentation. Our analysis assumes that the data is noiseless. 
The termination parameters of Algorithm 1 are e = 0 and 
kmax large enough (e.g., kmax = AI). We also assume 
that the columns of X are normalized to unit ^2 norm. To 
make our results consistent with state-of-the-art results, we 
first study the case where the subspaces are deterministic, 

*If argmax in step 3 of the algorithm gives multiple items, pick one 

of them in a deterministic way, e.g., pick the one with the smallest index. 


including both independent subspaces as well as arbitrary 
subspaces. We then study the case where both the subspaces 
and the data points are drawn at random. 

3.1. Independent Deterministic Subspace Model 

We first consider the case where the subspaces are fixed, 
the data points are fixed, and the subspaces are independent. 

Definition 2. A collection of subspaces {Si}^^i is called 
independent if dim ( S.^j = 'Yhi diin(5'i), where J^i 
is defined as the subspace Xi : Xi G Si}. 

Notice that two subspaces are independent if and only 
if they are disjoint, i.e., if they intersect only at the origin. 
However, pairwise disjoint subspaces need not be indepen¬ 
dent, e.g., three lines in are disjoint but not independent. 
Notice also that any subset of a set of independent sub¬ 
spaces is also independent. Therefore, any two subspaces in 
a set of independent subspaces are independent and hence 
disjoint. In particular, this implies that if are inde¬ 

pendent, then Si and := JUmiH are independent. 

To establish conditions under which SSC-OMP gives a 
subspace-preserving affinity for independent subspaces, it 
is important to note that when computing OMP(A_j, ajj), 
the goal is to select other points in the same subspace as Xj. 
The process for selecting these points occurs in step 3 of 
Algorithm 1, where the dot products between all points Xm, 
m j, and the current residual Qj. are computed and the 
point with the highest product (in absolute value) is chosen. 
Since in the first iteration the residual is Qq = Xj, we could 
immediately choose a point x^^ in another subspace when¬ 
ever the dot product of Xj with a point in another subspace 
is larger than the dot product of Xj with points in its own 
subspace. What the following theorem shows is that, even 
though OMP may select points in the wrong subspaces as 
the iterations proceed, the coefficients associated to points 
in other subspaces will be zero at the end. Therefore, OMP 
(with e = 0 and fcmax = A — 1) is guaranteed to find a 
subspace-preserving representation. 

Theorem 1. If the subspaces are independent, OMP gives 
a subspace-preserving representation of each data point. 

Proof. [Sketch only] Assume that Xj G Si. Since e = 0 and 
fcmax is large, OMP gives an exact representation, i.e., Xj = 
Xcj and Cjj = 0. Thus, since Si and are independent, 
the coefficients of data points in must be zero. □ 

3.2. Arbitrary Deterministic Subspace Model 

We will now consider a more general class of subspaces, 
which need not be independent or disjoint, and investigate 
conditions under which OMP gives a subspace-preserving 
representation. In the following. A® G denotes the 

submatrix of X containing the points in the jth subspace; 
for any Xj G Si, Xf^ G denotes the matrix A* 











with the point Xj removed; A’® and X^ij denote respectively 
the set of vectors contained in the columns of and Xtj. 

Now, it is easy to see that a sufficient condition for 
OMP(X_j, Xj) to be subspace preserving is that for each 
k in step 3 of Algorithm 1, the point that maximizes the dot 
product lies in the same subspace as Xj . Since = Xj and 
Qi is equal to Xj minus the projection of Xj onto the sub¬ 
space spanned by the selected point, say x, it follows that if 
Xj,x G Si then Qi G Si. By a simple induction argument, 
it follows that if all the selected points are in Si, then so 
are the residuals {q^.}. This suggests that the condition for 
OMP(X_j, Xj) to be subspace preserving must depend on 
the dot products between the data points and a subset of the 
set of residuals (the subset contained in the same subspace 
as Xj). This motivates the following definition and lemma. 

Definition 3. Let Q{A, b) be the set of all residual vectors 
computed in step 4 ofOMP{A, b). The set of OMP residual 
directions associated with matrix Xfj and point Xj G Si is 
defined as: 

W® := {to = ^ : q e Q{Xf^,x,),qf 0}. (7) 

The set of OMP residual directions associated with the data 
matrix AT® is defined as W® := Uj a; gS 

Lemma 1. OMP gives a subspace-preserving representa¬ 
tion for point Xj G Si in at most di iterations if 

yw e W® max < max Iw^xl. (8) 

Proof. [Sketch only] By using an induction argument, it 
is easy to see that the condition in ( 8 ) implies that the se¬ 
quence of residuals of OMP(X_j,a;j) is the same as that 
of the fictitious problem OMP(Xlj , Xj). Hence, the output 
of OMP{X-j,Xj) is the same as that of OMP{Xfj,Xj), 
which is, by construction, subspace-preserving. □ 

Intuitively, Lemma 1 tells us that if the dot product be¬ 
tween the residual directions for subspace i and the data 
points in all other subspaces is smaller than the dot prod¬ 
uct between the residual directions for subspace i and all 
points in subspace i other than Xj G Si, then OMP gives a 
subspace-preserving representation. While such a condition 
is very intuitive from the perspective of OMP, it is not as in¬ 
tuitive from the perspective of subspace clustering as it does 
not rely on the geometry of the problem. Specifically, it 
does not directly depend on the relative configuration of the 
subspaces or the distribution of the data in the subspaces. In 
what follows, we derive conditions on the subspaces and the 
data that guarantee that the condition in ( 8 ) holds. Before 
doing so, we need some additional definitions. 

Definition 4. The coherence between two sets of points 
of unit norm, X and y, is defined as p,{X,y) = 

max^gA^.yGi^ \ {x,y)\. 


The coherence measures the degree of “similarity” be¬ 
tween two sets of points. In our case, we can see that the 
left hand side of ( 8 ) is bounded above by the coherence be¬ 
tween the sets W® and X^. As per ( 8 ), this coherence 
should be small, which implies that data points from differ¬ 
ent subspaces should be sufficiently separated (in angle). 

Definition 5. The inradius r{V) of a convex body V is the 
radius of the largest Euclidean ball inscribed in V. 

As shown in Lemma 2, the right hand side of ( 8 ) is 
bounded below by r{Vfj), where Vfj := conv ( ± Xfj^ 
is the symmetrized convex hull of the points in the Ah sub¬ 
space other than Xj, i.e., Xfj. Therefore, ( 8 ) suggests that 
the minimum inradius := min^ r{Vfj) should be large, 
which means the points in Si should be well-distributed. 

Lemma 2. Let Xj G Si. Then, for all w G Wj, we have: 
max < max/ilW®, < max u(A’®, 

max > riVfA > r^. (9) 

xex^\{xj} ■' 

Proof. The proof can be found in the Appendix. □ 

Lemma 2 allows us to make the condition of Lemma 1 
more interpretable, as stated in the following theorem. 

Theorem 2. The output of OMP is subspace preserving if 

Vi = l,...,n, max//(W®, < r^. (10) 

k:k^i 

Corollary 1. The output of OMP is subspace preserving if 
Vi = l,...,n, max/r(A’®, < r^. (11) 

k:k^i 

Note that points in lA’® are all in subspace Si, as step 4 
of OMP(A := Xfj,b := Xj) has b and both in Si. 

The conditions (10) and (11) thus show that for each sub¬ 
space Si, a set of points {i.e., A® or W®) in Si should have 
low coherence with all points from other subspaces, and that 
points in A® should be uniformly located in Si to have a 
large inradius. This is in agreement with the intuition that 
points from different subspaces should be well separated, 
and points within a subspace should be well distributed. 

For a comparison of Corollary 1 and Theorem 2, note 
that due to Lemma 2 condition (10) is tighter than condition 
(11), making Theorem 2 preferable. Yet Corollary 1 has the 
advantage that both sides of condition ( 11 ) depend directly 
on the data points in X, while condition (10) depends on the 
residual points in W®, making it algorithm specific. 

Another important thing to notice is that conditions (10) 
and ( 11 ) can be satisfied even if the subspaces are neither 
independent nor disjoint. For example, consider the case 
where 5”^ p| S'*: 7 ^ 0. Then, the coherence /r(W®, could 



still be small as long as no points in W® and are near the 
intersection of Si and Sk- Actually, even this is too strong 
of an assumption since the intersection is a subspace. Thus, 
X G X^, y G W® could both be very close to the inter¬ 
section yet have low coherence. The same argument also 
works for condition (11). Admittedly, under specific dis¬ 
tributions of points, it is possible that there exists x G X^ 
and y G W* that are arbitrarily close to each other when 
they are near the intersection. However, this worst case sce¬ 
nario is unlikely to happen if we consider a random model, 
as discussed next. 


3.3. Arbitrary Random Subspace Model 


This section considers the fully random union of sub¬ 
spaces model in [30], where the basis elements of each sub¬ 
space are chosen uniformly at random from the unit sphere 
of the ambient space and the data points from each sub¬ 
space are uniformly distributed on the unit sphere of that 
subspace. Theorem 3 shows that the sufficient condition 
in (10) holds true with high probability {i.e. the probability 
goes to 1 as the density of points grows to infinity) given 
some conditions on the subspace dimension d, the ambi¬ 
ent space dimension D, the number of subspaces n and the 
number of data points per subspace. 


Theorem 3. Assume a random union of subspaces model 
where all subspaces are of equal dimension d and the num¬ 
ber of data points in each subspace is pd -f 1, where p > 1 
is the “density”, so that the total number data points in all 
subspaces is N{n, p, d) = n{pd + 1). The output ofOMP 
is subspace preserving with probability p > 1 — — 

N{n,p,d)e-'/pd if 


c^{p)\ogp D 

12 logA(n, p, d)’ 


( 12 ) 


where c{p) > 0 is a constant that depends only on p. 


One interpretation of the condition in (12) is that the di¬ 
mension d of the subspaces should be small relative to the 
ambient dimension D. It also shows that as the number 
of subspaces n increases, the factor log N (n, p, d) also in¬ 
creases, making the condition more difficult to be satisfied. 
In terms of the density p, it is shown in [30] that there ex¬ 
ists a po such that c(p) = l/-\/8 when p > po. Then, it 
is easy to see that when p > po, the term that depends 
on p is \osNtn,p,d) = iog(n°pd+i)) ^ ^hich is a monoton- 
ically increasing function of p. This makes the condition 
easier to be satisfied as the density of points in the sub¬ 
spaces increases. Moreover, the probability of success is 
1 — — N{n, p, d)e~dP'^, which is also an increas¬ 

ing function of p when p is greater than a threshold value. 
As a consequence, as the density of the points increases, the 
condition in Theorem 3 becomes easier to satisfy and the 
probability of success also increases. 


4. Relationships with Other Methods 


In this section we compare our results for SSC-OMP 
with those for other methods of the general form in (2). 
These methods include SSC-BP [14, 15, 30], which uses the 
norm as a regularize^ LRR [22] and LRSC [34], which 
use the nuclear norm, and LSR [25] which uses the ^2 norm. 
We also compare our results to those of [12] for SSC-OMP. 
The comparison is in terms of whether the solutions given 
by these alternative algorithms are subspace-preserving. 
Independent Subspaces. Independence is a strong as¬ 
sumption on the union of subspaces. Under this assump¬ 
tion, a subspace has a trivial intersection with not only ev¬ 
ery other subspace, but also the union of all other subspaces. 
This case turns out to be especially easy for a large category 
of self-expressive subspace clustering methods [25], and 
SSC-BP, LRR, LRSC and LSR are all able to give subspace¬ 
preserving representations. Thus, in this easy case, the pro¬ 
posed method is as good as state-of-the-art methods. 
Arbitrary Subspaces. To the best of our knowledge, when 
the subspaces are not independent, there is no guaran¬ 
tee of correctness for LRR, LRSC and LSR. For SSC-BP, as 
shown in [30], the representation is subspace-preserving if 

Vz = l,...,n, max p(V®, < Ti, (13) 

k:k^i 


where V® is a set of Ni dual directions associated with X®. 
When comparing (13) with our result in condition (10), we 
can see that the right hand sides are the same. However, the 
left hand sides are not directly comparable, as no general re¬ 
lationship is known between the sets 12® and W®. Nonethe¬ 
less, notice that the number of points in these two sets are 
not the same since card()2®) = Ni and card()A’®) = Nidi. 
Therefore, if we assume that the points in )2® and W® are 
distributed uniformly at random on the unit sphere, then 
p(>V®, a'®) is expected to be larger than p(V®, A^), mak¬ 
ing the condition for SSC-OMP less likely to be satisfied 
than that for SSC-BP. Now, when comparing (13) with our 
condition in (11), we see that the left hand sides are compa¬ 
rable under a random model where both V® and A® contain 
Ni points. However, the right hand side is rf, which is less 
than or equal to since the data are normalized and < 1. 
This again makes the condition for SSC-OMP more difficult 
to hold than that for SSC-BP. However, this difference is ex¬ 
pected to vanish for large scale problems, and SSC-OMP is 
computationally more efficient, as we will see in Section 5. 
Random Subspaces. For the random model, [30] shows 
that SSC-BP gives a subspace-preserving representation 
with probability p > 1 - - N{n, p, d)e-dP‘^ if 


c^{p)\ogp D 

12 log N{n,p,d) 


(14) 


If we compare this result with that of Theorem 3, we can 
see that the condition under which both methods succeed 










with high probability is exactly the same. The difference 
between them is that SSC-BP has a higher probability of 
success than SSC-OMP when d > 1. However, it is easy 
to see that the difference in probability goes to zero as the 
density p goes to infinity. This means that the performance 
difference vanishes as the scale of the problem increases. 
Other Results for SSC-OMP. Finally, we compare our re¬ 
sults with those in [12] for SSC-OMP. Define the principal 
angle between two subspaces Si and Sk as: 

9*1,= min min arccos(a;, w). (15) 

Il^ll2 = l||y||2 = l 

It is shown in [12] that the output of SSC-OMP is subspace¬ 
preserving if for alH = 1,..., n, 

max X’^) <n - - max cos 9*,^. (16) 

k'.k^t "V 12 k:k^i 

The merit of this result is that it introduces the subspace an¬ 
gles in the condition, and satisfies the intuition that the al¬ 
gorithm is more likely to work if the subspaces are far apart 
from each other. However, the RHS of the condition shows 
an intricate relationship between the intra-class property 
and the inter-class property 0*^, which greatly complicates 
the interpretation of the condition. More importantly, as 
is shown in the Appendix, the condition is more restrictive 
than (10), which makes Theorem 2 a stronger result. 

5. Experiments 

In this section, we first verify our theoretical results for 
SSC-OMP and compare them with those for SSC-BP by do¬ 
ing experiments on synthetic data using the random model. 
Specifically, we show that even if the subspaces are not 
independent, the solution of OMP is subspace-preserving 
with a probability that grows with the density of data points. 
Second, we test the performance of the proposed method on 
clustering images of handwritten digits and human faces, 
and conclude that SSC-OMP achieves the best trade off be¬ 
tween accuracy and efficiency. 

Methods. We compare the performance of state-of-the-art 
spectral subspace clustering methods, including LRSC [34], 
SSC-BP [15], LSR [25], and spectral curvature clustering 
(SCC) [8]. In real experiments, we use the code provided 
by the respective authors for computing the representation 
matrix C*, where the parameters are tuned to give the best 
clustering accuracy. We then apply the normalized spectral 
clustering in [36] to the affinity 167*1-1- |67*^|, except for 
SCC which has its own spectral clustering step. 

Metrics. We use two metrics to evaluate the degree to 
which the subspace-preserving property is satisfied. The 
first one is a direct measure of whether the solution is sub¬ 
space preserving or not. However, for comparing with state 


of the art methods whose output is generally not subspace 
preserving, the second one measures how close the coeffi¬ 
cients are from being subspace preserving. 

- Percentage of subspace-preserving representations (p%): 
this is the percentage of points whose representations are 
subspace-preserving. Due to inexactness in the solvers, co¬ 
efficients with absolute value less than 10 “^ are considered 
zero. A subspace-preserving solution gives p = 100. 

- Subspace-preserving representation error (e%) [15]: for 
each Cj in ( 1 ), we compute the fraction of its norm that 
comes from other subspaces and then average over all j, i.e., 
e= ^Ej(l-Ei(‘*^*i-|c*jl)/||cj||i),wherea;*j G {0,1} 
is the true affinity. A subspace-preserving 67 gives e = 0. 

Now, the performance of subspace clustering depends not 
only on the subspace-preserving property, but also the con¬ 
nectivity of the similarity graph, i.e., whether the data points 
in each cluster form a connected component of the graph. 

- Connectivity (c): For an undirected graph with weights 

W G and degree matrix D = diag(kF -1), where 

1 is the vector of all ones, we use the second small¬ 
est eigenvalue A 2 of the normalized Laplacian L = I — 

to measure the connectivity of the graph; 
A 2 is in the range [ 0 , and is zero if and only if the 
graph is not connected [17, 9]. In our case, we compute 
the algebraic connectivity for each cluster, A|, and take the 
quantity c = min^ A 2 as the measure of connectivity. 

Finally, we use the following two metrics to evaluate the 
performance of subspace clustering methods. 

- Clustering accuracy (a%): this is the percentage of cor¬ 
rectly labeled data points. It is computed by matching the 
estimated and true labels as a = max V ■ 69®}*, 

where tt is a permutation of the n groups, and 
are the estimated and ground-truth labeling of data, respec¬ 
tively, with their (i, j)th entry being equal to one if point j 
belongs to cluster i and zero otherwise. 

- Running time (t): for each clustering task using ®Matlab. 

The reported numbers in all the experiments of this sec¬ 
tion are averages over 20 trials. 

5.1. Synthetic Experiments 

We randomly generate n = 5 subspaces each of dimen¬ 
sion d = 6 in an ambient space of dimension D = 9. Each 
subspace contains Ni = pd sample points randomly gener¬ 
ated on the unit sphere, where p is varied from 5 to 3,333, 
so that the number of points varies from 150 to 99,990. For 
SSC-OMR we set e in Algorithm 1 to be 10“^ and fcmax to 
be d = 6. For SSC-BP we use the 61 -Magic solver. Due to 
the computational complexity, SSC-BP is run for p < 200. 

The subspace-preserving representation percentage and 
error are plotted in Figure 1(a) and 1(b). Observe that the 
probability that SSC-OMP gives a subspace-preserving so¬ 
lution grows as the density of data point increases. When 






(a) Subspace-preserving representation percentage (b) Subspace-preserving representation error 
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Figure 1. Performance of SSC-OMP and SSC-BP on synthetic data. The data are drawn from 5 subspaces of dimension 6 in ambient 
dimension 9. Each subspace contains the same number of points and the overall number of points is varied from 150 to 10® and is shown 
in log scale. For SSC-BP, however, the maximum number of points tested is 6,000 due to time limit. Notice that the bottom right figure 
also uses log scale in the y-axis. 


comparing with SSC-BP, we can see that SSC-OMP is out¬ 
performed. This matches our analysis that the condition for 
SSC-OMP to give a subspace-preserving representation is 
stronger (i.e., is more difficult to be satisfied). 

From a subspace clustering perspective, we are more in¬ 
terested in how well the method performs in terms of clus¬ 
tering accuracy, as well as how efficient the method is in 
terms of running time. These results are plotted in Figure 
1(d) and 1(e), together with the connectivity 1(c). We first 
observe that SSC-OMP does not have as good a connec¬ 
tivity as SSC-BP. This could be partly due to the fact that 
it has fewer correct connections in the first place as shown 
by the subspace-preserving percentage. For clustering ac¬ 
curacy, SSC-OMP is also outperformed by SSC-BP. This 
comes at no surprise as the sparse representations produced 
by SSC-OMP are not as subspace-preserving or as well con¬ 
nected as those of SSC-BP. However, we observe that as the 
density of data points increases, the difference in clustering 
accuracy also decreases, and SSC-OMP seems to achieve 
arbitrarily good clustering accuracy for large N. Also, it 
is evident from Figure 1(e) that SSC-OMP is significantly 
faster; it is 3 to 4 orders of magnitude faster than SSC-BP 
when clustering 6,000 points. We conclude that as N in¬ 
creases, the difference in clustering accuracy between SSC- 
OMP and SSC-BP reduces, yet SSC-OMP is significantly 
faster, which makes it preferable for large-scale problems. 


5.2. Clustering Images of Handwritten Digits 

In this experiment, we evaluate the performance of dif¬ 
ferent subspace clustering methods on clustering images of 
handwritten digits. We use the MNIST dataset [20], which 
contains grey scale images of handwritten digits 0 — 9. 

In each experiment, Ni G {50,100, 200,400, 600} ran¬ 
domly chosen images for each of the 10 digits are chosen. 
For each image, we compute a set of feature vectors us¬ 
ing a scattering convolution network [6]. The feature vector 
is a concatenation of coefficients in each layer of the net¬ 
work, and is translation invariant and deformation stable. 
Each feature vector is of size 3,472. The feature vectors for 
all images are then projected to dimension 500 using PCA. 
The subspace clustering techniques are then applied to the 
projected features. The results are reported in Table 1. 

The numbers show that both SSC-OMP and SSC-BP 
give a much smaller subspace-preserving representation er¬ 
ror than all other methods, with SSC-BP being better than 
SSC-OMP. This is consistent with our theoretical analysis 
as there is no guarantee that LSR or LRSC give a subspace¬ 
preserving representation for non-independent subspaces, 
and SSC-BP has a higher probability of giving a subspace¬ 
preserving representation than SSC-OMP. 

In terms of clustering accuracy, SSC-OMP is better than 
SSC-BP, which in turn outperforms LSR and LRSC, while 




















Table 1. Performance of subspace clustering methods on the 
MNIST dataset. The data consists of a randomly chosen num¬ 
ber Ni G {50,100, 200,400, 600} of images for each of the 10 
digits (i.e., 0-9), with features extracted from a scattering network 
and projected to dimension 500 using PCA. 


No. points 

500 

1000 

2000 

4000 

6000 

e%: subspace-preserving representation error 

SSC-OMP 

42.13 

38.73 

36.20 

34.22 

33.22 

SSC-BP 

29.56 

24.88 

21.07 

17.80 

16.08 

LSR 

78.24 

79.68 

80.83 

81.75 

82.18 

LRSC 

81.33 

81.99 

82.67 

83.15 

83.27 

SCC 

89.89 

89.87 

89.85 

89.81 

89.81 

a%: average clustering accuracy 

SSC-OMP 

83.64 

86.67 

90.60 

91.22 

91.25 

SSC-BP 

83.01 

84.06 

85.58 

86.00 

85.60 

LSR 

75.84 

78.42 

78.09 

79.06 

79.91 

LRSC 

75.02 

79.76 

79.44 

78.46 

79.88 

SCC 

53.45 

61.47 

66.43 

71.46 

70.60 

t(sec.): running time 

SSC-OMP 

2.7 

11.4 

93.8 

410.4 

760.9 

SSC-BP 

20.1 

97.9 

635.2 

4533 

13605 

LSR 

1.7 

5.9 

42.4 

136.1 

327.6 

LRSC 

1.9 

6.4 

43.0 

145.6 

312.9 

SCC 

31.2 

48.5 

101.3 

235.2 

366.8 


see performs the worst among the algorithms tested. 

eonsidering the running time of the methods, SSe-BP 
requires much more computation, especially when the num¬ 
ber of points is large. Though SSe-OMP is an iterative 
method, its computation time is about twice that of LSR 
and LRSe, which have closed form solutions. This again 
qualifies the proposed method for large scale problems. 

5.3. Clustering Face Images with Varying Lighting 

In this experiment, we evaluate the performance of dif¬ 
ferent subspace clustering methods on the Extended Yale B 
dataset [18], which contains frontal face images of 38 indi¬ 
viduals under 64 different illumination conditions, each of 
size 192 X 168. In this case, the data points are the original 
face images downsampled to 48 x 42 pixels. In each experi¬ 
ment, we randomly pick n G {2,10, 20, 30, 38} individuals 
and take all the images (under different illuminations) of 
them as the data to be clustered. 

The clustering performance of different methods is re¬ 
ported in Table 2. In terms of subspace-preserving recov¬ 
ery, we can observe a slightly better performance of SSC- 
BP over SSC-OMP in all cases. The other three methods 
have very large subspace-preserving representation errors 
especially when the number of subjects is n > 10. In terms 
of clustering accuracy, all methods do fairly well when the 
number of clusters is 2 except for SCC, which is far worse 
than the others. As the number of subjects increases from 
10 to 38, LSR and LRSC can only maintain an accuracy of 
about 60% and SCC is even worse, but SSC-OMP and SSC- 


Table 2. Performance of subspace clustering methods on EYaleB 
dataset. A ’NA’ denotes that a running error was returned by the 
solver. The data consists of face images under 64 different illu¬ 
mination conditions of a randomly picked n = {2,10, 20, 30, 38} 
individuals. Images are downsampled from size 192 x 168 to size 
48 x 42 and used as the feature vectors (data points). 


No. subjects 

2 

10 

20 

30 

38 

e%: subspace-preserving representation 

error 


SSC-OMP 

4.14 

13.62 

16.80 

18.66 

20.13 

SSC-BP 

2.70 

10.33 

12.67 

13.74 

14.64 

LSR 

22.77 

67.07 

79.52 

84.94 

87.57 

LRSC 

26.87 

69.76 

80.58 

85.56 

88.02 

SCC 

48.70 

NA 

NA 

96.57 

97.25 

a%: average clustering accuracy 

SSC-OMP 

99.18 

86.09 

81.55 

78.27 

77.59 

SSC-BP 

99.45 

91.85 

79.80 

76.10 

68.97 

LSR 

96.77 

62.89 

67.17 

67.79 

63.96 

LRSC 

94.32 

66.98 

66.34 

67.49 

66.78 

SCC 

78.91 

NA 

NA 

14.15 

12.80 

t(sec.): running time 

SSC-OMP 

0.6 

8.3 

31.1 

63.7 

108.6 

SSC-BP 

49.1 

228.2 

554.6 

1240 

1851 

LSR 

0.1 

0.8 

3.1 

8.3 

15.9 

LRSC 

1.1 

1.9 

6.3 

14.8 

26.5 

SCC 

50.0 

NA 

NA 

520.3 

750.7 


BP maintain a reasonably good performance, although the 
accuracy also degrades gradually. We can see that SSC-BP 
performs slightly better when the number of subjects is 2 or 
10, but SSC-OMP performs better when n > 10. 

6. Conclusion and Future Work 

We studied the sparse subspace clustering algorithm 
based on OMP. We derived theoretical conditions un¬ 
der which SSC-OMP is guaranteed to give a subspace¬ 
preserving representation. Our conditions are broader than 
those of state-of-the-art methods based on ^2 or nuclear 
norm regularization, and slightly weaker than those of 
SSC-BP. Experiments on synthetic and real world datasets 
showed that SSC-OMP is much more accurate than state-of- 
the-art methods based on ^2 or nuclear norm regularization 
and about twice as slow. On the other hand, SSC-OMP is 
slightly less accurate than SSC-BP but orders of magnitude 
faster. Moreover, we are one of the few [1,2] that have 
demonstrated subspace clustering experiments on 100,000 
points. Overall, SSC-OMP provided the best accuracy ver¬ 
sus computation trade-off for large scale subspace cluster¬ 
ing problems. We note that while the optimization algo¬ 
rithm for SSC-BP in [15] is inefficient for large scale prob¬ 
lems, our most recent work [39] presents a scalable algo¬ 
rithm for elastic net based subspace clustering. A compari¬ 
son with this work is left for future research. 
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Appendices 

In the appendices, we provide proofs for the theoretical re¬ 
sults in the paper. We also provide the parameters of all 
the clustering methods studied in the handwritten digits and 
face image clustering experiments. 

A. Proof of Theorem 1 

In Theorem 1 , we claim that the SSC-OMP gives sub¬ 
space preserving representations if subspaces are indepen¬ 
dent. Here we provide the proof 

Theorem. If the subspaces are independent, OMP gives a 
subspace-preserving representation of every data point. 

Proof Consider a data point Xj G Si. We need to show 
that the output of OMP(X_j, tCj) is subspace-preserving. 
As an assumption, the termination parameters in OMP are 
set to be e = 0 and fc^ax = — 1 (i.e., the total number 

of points in the dictionary X_j). This means, in particular, 
that OMP always terminates with some iteration k* < N— 1 
with Qk* = 0, which can be seen to hold as follows. If 
the OMP algorithm computes qk — 0 for some k < N — 
2, then there is nothing to prove. Thus, to complete the 
proof, we suppose that qk f 0 for all 0 < fc < W — 2, and 
proceed to prove that qN-i = 0. In the OMP algorithm, the 
columns of X-j indexed by Tk for any k are always linearly 
independent. This is evident from step 4 of Algorithm 1, 
as the residual vector Qk is orthogonal to every column of 
X-j indexed by Tk, thus when choosing a new entry to be 
added to Tk in step 3 of Algorithm 1 , points that are linearly 
dependent with the points indexed by Tk would have zero 
inner product with q^,, so would not be picked. Since all of 
the columns of X-j have been added by iteration iV — 1, we 
know that the columns of X-j are linearly independent and 
must contain at least di linearly independent vectors from 
Si ^ We conclude that qk* = qN-i = 0 with k* = N — 1, 
as claimed. In light of this result and denoting T* := Tk*, 
it follows from Qk* = 0 that Pt* ■ Xj = Xj by line 4 
of Algorithm 1, so that Xj is in the range of matrix Xt*, 
which denotes the columns of X_j indexed by T*. 

As a consequence of the previous paragraph, the hnal 
output of OMP, given by 

c* = argmin \\xj — X_jc\\2, 

c:Supp(c)CT* 

will satisfy Xj = X-j • c*. We rewrite it as 

Xj ^ ^ ‘ C-rn ~ ^ ^ ‘ ^rn' (A.l) 

m^T* m^T* 


*We make the assumption that there are enough samples on each sub¬ 
space. More specifically, Vz, Vccj G Si, rank(A'^^.) = dim(5'i). 


Observe that the left hand side of (A.l) is in subspace Si 
while the right hand side is in subspace S-i := ^rn- 

By the assumption that the set of all subspaces is indepen¬ 
dent, we know Si and S-i are also independent, so they 
intersect only at the origin. As a consequence, we have 

0 — 'y ^ Xm ‘ = y ^ Xm ' (A.2) 

m\Xjri^Si Tfl-.Xm^Si 

m^T* 

where we also used the fact that = 0 for all m f: T*. 
Combining (A.2) with the early fact that the columns of 
X-j indexed by Tk are linearly independent for all k (this 
includes k = k*), we know that 

c = 0 \iXmi s, and m G T*. (A.3) 

Finally, we use this to prove that c* is subspace-preserving. 
To this end, suppose that c* ^ 0, which from the dehnition 
of c* means that j G T*. Using this fact, c* f 0, and (A.3) 
allows us to conclude that c* G Si. Thus the solution c* is 
subspace-preserving. □ 

B. Proof of Lemma 1 

In this section, we provide a detailed proof of Lemma 1. 
The proof follows straight forwardly by comparing induc¬ 
tively the steps of the procedure OMP(X_j,a;j) and the 
procedure of the fictitious problem OMP(2fl^ , tCj). The 
idea is that these two procedures follow the same “path” if 
the condition of the lemma is satisfied. 

Lemma 3. OMP gives a subspace-preserving representa¬ 
tion for point Xj G Si in at most di iterations if 

Vm G yV] max < max |tu^a;|. 

(B.l) 

Proof. Let fc* be the number of iterations computed by the 
procedure OMP(X_j, Xj) so that Qk* = 0 (this was estab¬ 
lished in the hrst paragraph of the proof for Theorem 1). We 
prove that the solution to OMP(2f_j, is subspace pre¬ 
serving by showing that Tk* only contains indexes of points 
from the i-th subspace. This is shown by induction, in the 
way that Tk contains points from the i-th subspace for every 
0<k<k*. 

The set of residual directions lA’j introduced in Defi¬ 
nition 3 plays an essential role in this proof For nota- 
tional clarity, we denote q^. to be the residual vector gen¬ 
erated at iteration k of the algorithm OMP(2f!_^ , atj) (note 
that this is the fictitious problem). The residual vectors of 
OMP(Ar_j, a:j) are denoted by q^.. In the induction, we 
also show that OMP(2fl^,a;^) does not terminate at any 
k < k*, and that Qk = Qk whenever k < k*. 



First, in the case of /c = 0, the argument that Tg only 
contains indexes of points that are from subspace i is triv¬ 
ially satished since Tg is empty. Also, qg = qg is satisfied 
because they are both set to be Xj in line 1 of Algorithm 1 . 

Now, given that q^, = q^. for some k < k* and that 
Tk contains points only from subspace Si, we show that 
Qk+i — 9k+i 7fc+i contains indexes of points from 

subspace i. This could be shown by noticing that the added 
entry in step 3 of Algorithm 1 is given by arg max |a;^qj, |. 

Here, since q^. = q^,, we have that qfc/|lqfc ||2 is in the 
set Wj. Then, by using condition (B.l), we know that the 
arg max will give an index that corresponds to a point in Si. 
This guarantees that Tk+i only contains points from sub¬ 
space Si. Moreover, the picked point is evidently the same 
as the point picked at iteration k of the OMP(A'l^, Xj). It 
then follows from step 4 of Algorithm 1 that the resultant 
residuals, and (Jk+i, are also equal. In the case of 

k + 1 < k*, this means that q^+i = (Jk+i 7^ 0’ so the hc- 
titious problem OMP(Xl^,a;^) does not terminate at this 
step. This hnishes the mathematical induction. 

The fact that OMP terminates in at most di iterations fol¬ 
lows from the following facts; (i) we have established that 
OMP(X_^ , a:^) produces the same computations as does 
OMP(Xl^ , ajj); (ii) the collection of vectors selected by 
OMP{X'^_j,Xj) are linearly independent and contained in 
subspace Sf, and (iii) the dimension of Si is equal to di. □ 

C. Proof of Lemma 2 

In this section, we prove Lemma 2 in Section 3. 

Lemma. Let Xj G Si. Then, for all w G Wj, we have: 
max |t(;^a;| < max A*) < max , X^) / tg 

max Im^atl > r{Vfi) > ri. (C.l) 


Proof. Two of the inequalities need proofs while the other 
two follow directly from definitions. 

For the hrst one, we prove that maxfc:^^^ P-(W\ < 

maxfe^fc^i n{X'^, X^)lri. To do this, it suffices to show that 
for any k f i, fi{W\X’=) < fi{XfX'^)ln ■ Notice that 
any point w in W® is in the subspace Si, so it could be 
written as a linear combination of the points in X^, i.e. w = 
® • c for some c. Specifically, we pick a c that is given by 
the following optimization program: 

c = argmin ||c||i s.t. w = ■ c. (C.2) 


inequality, we can observe that 

• eiloo = • £||oo 

= max|y7c| < max||y£||oo||c||i 

= l|c||imax|i?/f||oo 
< ||c||i max/r(A’®, A*) 

= ^kiX\X'^)■\\c\\,. 

To proceed, we need to provide a bound on ||c||i. As c is 
dehned by (C.2), it is shown that such a bound exists and is 
given by (see, e.g. lemma B.2 in [31]) 

iifiiii<ii^ii2/r(n = iA(n, 

where := conv(±A’®), and we use the fact that every 
point in the set of residual directions W® is dehned to have 
unit norm. Now, by the dehnitions we can get r(7^®) > 
r{Vfj) > ri, thus ||c||i < l/r(7^®) < l/r^, which gives 

\\X'^^w\\oo<p{X\X^)/r,. (C.3) 

Finally, since (C.3) holds for any w in W®, the conclu¬ 
sion follows that n{W\X^) < \i{X\X'^)lr,. 


For the second part, we prove that for all w G W®, 
I to T a; I > r(PLj), or equivalently, • 

■utlloo > r{Vfj). The proof relies on the result (see dehni- 
tion 7.2 in [29]) that for an arbitrary vector y G Si, 

\\x^Jry\U<i^\\yh<i/r{vf^)- 

It then follows that if (by contradiction) \\Xf^j • m||oo < 
r{Vfj), then \\XX . = r{Vfj) — e > 0 for some 

e > 0, and 


IX 


iT 


W 


^r{Pt.)-e 


= 1 < 1 


r(Pl.) - ( 


b < l/r(r ,) 


\wh<{r{rf,)-e)/rirA<l, 


which contradicts the fact that w is normalized. □ 


D. Proof of Theorem 2 and Corollary 1 

We explicitly show the proof of Theorem 2 and Corollary 
1. They follow from the previous two lemmas. 

Theorem. The output of OMP is subspace preserving if 


Using (C.2), dehning = X^^ ■ X®, letting the fth col¬ 
umn of Y by denoted by yi, and using the Cauchy-Schwarz 


max/r(W®,A’'®) < ri. (D.l) 

kik^i 


Vz = 1,..., n, 




Corollary. The output of OMP is subspace preserving if 
Vi = l,...,n, max (D.2) 

k-.k^i 

Proof Notice from Lemma 1 that the solution of SSC- 
OMP for Xj e Si is subspace preserving if 

Viu G W* max |t(;^a;| < max 

(D.3) 

Lemma 2 provides bounds for both sides of (D.3) from 
which the theorem and the corollary follow. 

□ 


assumption. Any residual point w G WfVi also has uni¬ 
form distribution on the unit sphere as it depends only on 
points in A”®, which are independent and uniformly dis¬ 
tributed. Furthermore, any pair of points x G and 
w G W® are distributed independently because points in 
and A"® are independent. Thus the result of Equation 
(E.3) is applicable here. Since there are at most d x 
pairs of inner product in /r(>V®, A”^'), by using the union 
bound we can get 

P{pl{W\X^) < /2 for alH,/ c) > l-p^dN^, 
where we have dehned 


E. Proof of Theorem 3 


Theorem. Assume a random model where all subspaces 
are of equal dimension d and the number of data points in 
each subspace is pd+f where p > lis the “density”, so the 
total number data points in all subspaces is N = n{pd+l). 
The output of OMP is subspace preserving with probability 
p > 1 - f - Ne-^'^ if 


c^{p)\ogp D 
12 logA^’ 


(E.l) 


T = and = 2/N^. 

As a consequence, if the condition (E.l) holds then we have 
f < p.. Applying again the union bound we get that condi¬ 
tion (E.2) holds with probability p > 1 — p^d — Pr- This 
finishes the proof. □ 

F. Comparison with prior work on SSC-OMP 

In Theorem 2 we give a sufficient condition for guaran¬ 
teeing subspace-preserving of the SSC-OMP: 


where c{p) > 0 is a constant that depends only on p. 

Proof The proof goes by providing bounds for the left and 
right hand side of the inequality in Theorem 2, copied here 
for convenience of reference: 

Vz = l,...,n, max p(>V®, A"^) < r^. (E.2) 

k-.k^i 

We hrst give a bound on the inradius . Denote 

f = and p, = Ne--r>'‘, 

in which c(p) is a numerical constant depending on p. [29] 
shows that since points in each subspace are independently 
distributed, it holds that 

P{ri > f for all i) > \ — p^. 

Next we give a bound on the coherence. Erom an upper 
bound on the area of a spherical cap [3, 29], we have that 
if a;,y G are two random vectors that are distributed 
uniformly and independently on the unit sphere, then 

p{k*,»)I>V^}s jia- (E-3) 

Under the random model, points x G X^,Vk are dis¬ 
tributed uniformly at random on the unit sphere of by 


Vi = l,...,n, max p(yV\ X^) < Ci, (E.l) 

k:k^i 

and in Corollary 1 a stronger sufficient condition: 

Vi = l,...,n, max p(A’®, A”^) < r^. (E.2) 

k:k^i 

Prior to this work, [12] gives another sufficient condi¬ 
tion for SSC-OMP giving subspace-preserving representa¬ 
tion, namely, 

max p{X\X'^) < n- — max cos 9*^, (P.3) 

k'.k^i k'.k^t 

in which the subspace angle is dehned as 

9* f. = min min arccos(a;, y). (P.4) 

’ x^Si y&Sk 

M2=-L\\y\\2=l 

We claim that Theorem 2 in this work is a stronger re¬ 
sult than that provided in the work [12], as the sufficient 
condition of (P.3) implies (El). Here we give a rigorous 
argument for this claim. 

Notice that the inequality in (P.3) implies that Vfc ^ i, 

piXfX^) <n- v/2^^cos0*fe, (P.5) 

see Lemma 1 in their paper. We show that condition (P.5) 
implies (E.l) when < 1/2, and implies condition (P.2) 
when Ti > 1/2, which means that their result is weaker 
than our result that is based on condition (El). 













Case 1. If < 1/2, then ^2 — 2ri > 1, thus 
(F.5) ^ A’^) < Vi — cos9*j. ^ cos0/j, < 

Case 2. If Vi > 1/2, then 

(K5) ^ fi{X\ X^) <n- v/2^^/x(A:’\ X^) 

^ /^(A'*, A’'^) < r,/(l + ^/2^^) 
^/i(A’‘,A’'=) <r,/(l + (2-2r,)) 

^ niX^X’^) < {nf O (F.2) ^ (F.l). 

So the condition in (FA) is implied by (F.3). 

G. Parameters for real experiments 

For the purpose of reproducible results, we report the 
parameters used for all the methods in the real data experi¬ 
ments. For OMP, we set e in Algorithm 1 to be 10“^, k^ax 
to be the true subspace dimension in the synthetic experi¬ 
ments, 10 in digit clustering and 5 in face clustering. For 
LSR, we use “LSR2” in [25] with regularization A = 60 for 
digit clustering and A = 0.3 for face clustering. For LRSC, 
we use model “P3” in [34], with parameters t = a = 0.1 
for digit clustering and t = a = 150 for face clustering. 
For see, we use dimension d = 8 for digit clustering and 
5 for face clustering. We use ^i-Magic for SSC-BP in the 
synthetic experiments. For digit and face clustering, we use 
the noisy variation of SSC-BP in [15, sec. 3.1] for digit 
clustering with A2 = SO/p^, and the sparse outlying entries 
variation of SSC-BP in [15, sec. 3.1] for face clustering 
with Ae = 30/fie- For algorithms, these constants were 
chosen to optimize performance 

For a fair comparison, we allow standard pre/post¬ 
processing to be used whenever they improve the cluster¬ 
ing accuracy. For preprocessing, we allow normalization 
of the original data points using the £2 norm, and for post¬ 
processing, we allow normalization of the coefficient vec¬ 
tors using the norm. For experiments on synthetic 
data, we do not use any pre/post-processing. In digit clus¬ 
tering, preprocessing is applied to SSC-BP and SCC, and 
post-processing is used for SSC-OMP and SSC-BP. For 
face clustering, preprocessing is applied to SSC-OMP, LSR, 
LRSC and SCC, while post-processing is used for SSC-BP 
and LRSC. 
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